Parallelizing Frequent Itemset Mining with FP-Trees
نویسندگان
چکیده
A new scheme to parallelize frequent itemset mining algorithms is proposed. By using the extended conditional databases and k-prefix search space partitioning, our new scheme can create more parallel tasks with better balanced execution times. An implementation of the new scheme with FP-trees is presented. The results of the experimental evaluation showing the increased speedup are presented.
منابع مشابه
Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm
Discovery of frequent itemsets is a very important data mining problem with numerous applications. Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. A significant amount of research on frequent itemset mining has been done so far, focusing mainly on developing faster complete mining al...
متن کاملFP-Bonsai: The Art of Growing and Pruning Small FP-Trees
In the context of mining frequent itemsets, numerous strategies have been proposed to push several types of constraints within the most well known algorithms. In this paper, we integrate the recently proposed ExAnte data reduction technique within the FP-growth algorithm. Together, they result in a very efficient frequent itemset mining algorithm that effectively exploits monotone constraints.
متن کاملAccelerating Closed Frequent Itemset Mining by Elimination of Null Transactions
The mining of frequent itemsets is often challenged by the length of the patterns mined and also by the number of transactions considered for the mining process. Another acute challenge that concerns the performance of any association rule mining algorithm is the presence of „null‟ transactions. This work proposes a closed frequent itemset mining algorithm viz., Closed Frequent Itemset Mining a...
متن کاملThree Strategies for Concurrent Processing of Frequent Itemset Queries Using FP-Growth
Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. Recently, a new problem of optimizing processing of sets of frequent itemset queries has been considered and two multiple query optimization techniques for frequent itemset queries: Mine Merge and Common Counting have been proposed and ...
متن کاملParallelizing Frequent Itemset Mining Process using High Performance Computing
Data is growing at an enormous rate and mining this data is becoming a herculean task. Association Rule mining is one of the important algorithms used in data mining and mining frequent itemset is a crucial step in this process which consumes most of the processing time. Parallelizing the algorithm at various levels of computation will not only speed up the process but will also allow it to han...
متن کامل